For data \(\{(\boldsymbol{x}_i, y_i); i = 1, \ldots, N\}\) with \(\boldsymbol{x}_i \in \mathbb{R}^d\) and \(y_i \in \mathbb{R}\),
\[y_i = f(\boldsymbol{x}_i) + \varepsilon_i, \quad \varepsilon_i \sim \mathcal{N}(0, \sigma^2_n)\]
We propose a underlying function,
\[f(\cdot) \sim\mathcal{GP} \left( \mu(\boldsymbol{x};\boldsymbol\theta_\mu), k(\boldsymbol{x}, \boldsymbol{x'}; \boldsymbol{\theta}_k) \right)\]
where \(\mu(\cdot)\) is the mean function and \(k(\cdot)\) is the covariance kernel function, with hyperparameters \(\boldsymbol\theta_\mu\) and \(\boldsymbol\theta_k\), respectively.
ThunderKAT data has \(N = 6,394\) light curves, with \(n_i \in [1, 166]\) observations, ranging from \([1, 1296]\) days.
Light curves are (usually) univariate and 1-dimensional so should not cause a computational bottleneck.
Simulated light curves have \(n_i = 500\) observations.
Estimating the uncertainties around PSDs and \(\theta\)s are crucial which favours using GPs.
For light curves from MeerKAT:
For the simulated datasets, I can set arbitrary values for parameter ground truths.
These need to enter to setting priors.
The authors suggest standardising the data, i.e., z-scores, to improve inference.
Need to consider: Smoothness, lengthscales, periodicity, outliers and tails, asymmetry, and stationarity.
Authors suggest “kernel composition”.
For simulation study: Add SE and periodic kernels since the physics suggests these components will be present additively.
For ThunderKAT: lengthscale priors should reflect the size of data gaps and the total length of light curve.
Transients are by definition not stationary!
Authors suggest setting up a simple non-GP baseline model for comparison.
Simulation study: Lomb-Scargle periodogram of a fully sampled simulated light curve.
ThunderKAT analysis is still underway.
Comments on Paper
GPyTorchcode is provided.